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ABSTRACT ' ' ■ ''• ■ 

This paper deals with problems of measuring change ia\ 
motdr behavior. Conventional measurement procedures and statisti6al V / 
analyses involving change are presented in the f itst isection. This 
section discusses, difference scores as the criterion, measure and the 
use of all scores Jas the dependent variables. This latter<' category 
involves using either :^nivariate or multivariate analysis of. variance 
(j^NOVA, HANOVA). The author gives suggestions for generally 
conservative researchers who want to use the methods- discussed, but 
the second section describes alternativ^e methods for analyzing 
change* The statistical techniques described are gradually being 
adopted by associated disciplines and are probably mor« appropriate 
for describing and. predicting .performance by.er time. This second, V 
section includes stochastic processes, time-series, factor analytic 
models of change, and curve-fitting as a change indicator. An - 
appendix provides empirical comparisons among the'^nwnierous ' v 

statistical methods described. (PB) ,v ' <, ' 
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• ' "Nothing endu:;.es but change" " • . 

(Heraclltus, 500 B.C.) 

. "There is nothing in this world constant, but. inconstancy" 

(Jonathan Swift, 1707) 

Change, the focus of many disciplities (history, geology, anthropology) and 
central to almost all scientific research, is such. a necessiary process and yet 
such a difficult one to measure* Science demands empiricism anu, if we wish to 
infer cauiiiality, this empiricism must be in the form of controlled; experimentation, 
often leading to a. pre-post type of design with a resultivig change or difference - 
score<« The problems inherent in the measurement of change have been well expounded ^ 
(Dotson, 1973; Harris, 1963; .Stelmach, 1975) but the solutions seem slotf to develop. 
Ber^iter (1963) has claimed- that i!t 1b only in this area that he has heard of 
researchers abandoning research ol^ectives due to the Ihck of suitable statistical 
procedures available* The task of providing valid s^tions to the problems of 
,Tneasuring change is obviously a formidable one, and. is certainly not going to be , 
accomplished in this paper. What is presented, is a rather empirical account of 
the' available methods *f or analyzing . performance over time, the advantages and dis- 
advantageer of these procedures, and some biased, personal decisions regarding the 
''best" solutions. The discussion is dichotomized into two gtineral approlS1^es, one 
involving the common difference scores and repeated measures designs with their , 
asspciated parametric statistical procedures,, and the other approach focussing on . 
alternate^ less common* ways to study changeV specif ically; stochastic methods, 
time-seifies' analysis, factor analytic procedures, and curve fitting. 

> ■ • 

A.. CONVENTIONAL MEi^SUREMENT PROCEDURES AND STATISTICAL ANALYSES INVOLVING CHANGE 

An overview of the educational . and psychological literature dealing with the 
problems encountered in measuring change teflects that an, indication of change is 
usually provided by two scores only a pre»test and a post-test, score, interspersed 
\ with some treatment condition or time lapse. However, research in sport and physical 
activity often results in a large number of responses p^er subject* rather than just 
a pair of scores, t^us 'allowing for a greatet variety of possible designs and,ana«. 
lysed. Consequently, it is necessary to examine the conveintional measurement of 
ehange as two distinct processes, one involving a difference sYiore, the other 
utilizing all the datu in a repeated measures design. / 



Diffeit'ence Scores gs the Crlteylon Measure * . 

a) Selection oi a Criterion Score (unadjusted)* 

If the research laethodology utilized yields a single scores on the . 
' first adteinistratioix of a test (X^) and another single, score on a repe« 
tition of that test at some subsequent point in time .(X2)» then ^here is 
little choice in the criterion score to use if the researcher wishes ^to [ 
use a single, u adjusted, dependent variable. It has to be this difference 
(D « X2 — Xi) which has many inherent deficiencies and numerous possible! 
transformations to- reduce these deficiencies . (none of which are very 
satisfactory). These are discussed in section «l(b). A m6re likely situa- 

' ■ ■ ■ ' a 

tion, however^ is when there are a number of observations available for 

, ' ., ■ . . ^ ■ ' ■ 

each S (e.g., heart rate at each minute ot a 15**minute exercise bout, 30 

■ . , ■ . _ .... . ■ ■ \ ■ . 

learning trials), but the investigator wishes to reduce this data to a 

single change score or learning score. The problema then confronting him 

are: (1) how many trials should he use to estimate both the initial £^d 

final states of the Ss?, and (2) should he use the best, or the average, 

of each of these sets of trials? Before' commenting .on some possible 

solutions to these two problems, it should be Rioted that neither of these 

problems should ever arise when dealing with the analysis of change. 

Discarding or reducing data, when suitable statistir^al methods are availablje 

for' analyzing all available data, seams like very inefficient^ research. 

If the goal is the be able to understand motor behavior," for purposes of 

explanation and prediction, then one must look at all the, data, and analyze 

it by a repeated measures MOVA, time series, or some other equally suitable 

tool. However j many investigators insist' on obtaining a single change 

score, thus some discussion on these points -seems necessary. . 

The problem of choosing' between the best and the av^^ge score has 
only one acceptable solution '- use^ the average. There is sufficient < 
support for use of tW average rather than the best in the general case 
(Baumgartner, 1974; Henry, 1967; Kroll, 1967) and in the, specific cas6 ol 
difference scores it is even more necessary* the reliability of a differ- 
ence score is so dependent upon the reliability of the two scores which' 
produce this difference, that it is imperative that these two scores 
possess maximum reliabilit:y themselves thus averages are necessaty* 



The solution, to the question of the optimal number o^ trials to use 
In comp,utlng these pre and post-score averages is, not ' quite so unambiguous* 
The problem facing an Investigator v;ho uses a learning *ta3kHs how can he 
choose a score which maximizes both reliability and dlscrlmlnability at 
the same time? In a task which has, say, 20 trials,, the difference between 
trial one and trial ;1!0 will probably show the greatest discrlminability 
as far as learning is cjncernedj however, it may not be very reliable. , 
If on^ uses the average of the first ten ttials a? an indication of Initial 
score, and the avet '.{re of the last ten as the per^fonnance score, then the 
dif ference between these two may show high .reliability, but it probably 
will not sihpw much learning. Carron and Martenluk (1970) pointed put the 
necessity for comparing .the differences between both the reliabilities and 
discriminability obtained by grouping trials In different ways. Others > 
(Baumgartner and Jackson, 1970j McCr.aw and Mc.Clenney'» 1965) have attempted 
to give definitive rules for determining- the n»jo?b(?t' of trials lind the 
measurement schedules one should employ. Because of . the great variability 
in type of task^ characteristics of Ss, etc.,^ It does ,p|^ot sedn possible 
to choose a specific rule for deteminitig the "best" criterion measure , 
for all situations - even for ^11 situations itivolvlng a specific "task or 
set of measures. If one. decides that It is necessary to reduce the data . 
to a single dependent variable .(which, to this writer, does not seem to 
be a valid procedures), then utilizing procedures as suggested by Carron . 
arid Marteniuk (1970), and following, the ba^lc principlesf of reliability 
and V;|Jlidlty of d*6pendent variable scores V7liich have been frequently and 
explicitly laid out for us (e.g.; Alexander, 19A7; Burt, 195^; F'eldt and 
licKee, 1957; Krause-', 1969; Lomnlcki, 197 3f Schutz and Roy, 1973) one should 
bei able to arrive at a procedure for selecting the most suitable criterion 
score in each specific situation. , ^ ' 

b) Selection of a Criterlpn Score (adjusted). 

V ' ' ' * ' . . 

, a * ■'''* • . t 

' In situations wheire. thele are only two opportunities for p^bservation 
and measurement (pre and post), or where the Investigator insists on re- 
ducing repeajbed measures tp a pre-post case, then it is probably necessary 
to apply soihe type of statistical" adjustment or cori'iiction factor to either 
the, difference score oi: to the final ; score. The following section gives 
possible solutions !fot each of a number of common problems associated with 
using difference scopes; 



'These problems have been well defined by many investigators (Bereiter, 1963; 
Cronbach' and Furby, 1970j* Lord, 19.56, 1963rMcNeinar 

<i) Problem 1. Regression Effect} In general, on the second admini- 
stration of a test, and in the absence of any true change or treatment 'effect 
the observed scores for those who scored high oh test #1 tend to decline and 
the observed scores of those who scored lowest on test //l tend to increase 
on test #2.; . , 

Solutions. The most valid, and least complicated, solution, is to use 
a homogeneous group so all Ss have essentially 'the same initial score. If 
the experiment involves comparisons between groups, then equate the group 
means initially, either by randomization with large sample sizes, blocking, 
matching, or statistically . through analysis of covarlance (these methoids are 
discussed below In, Section 11(a). . 



Anotheryp^sslble solution, thcC^e to which psychometriclajtts have di'rec- 
ted their attention, is to adjust the final score on the basis of the pre- 
post.llpear regression effect. This can be done by fitting a regression 
^ihe to the pire-post scor6s (Xj^, under the conditions of the null hy- 
pothesis; I.e. , ho treatment effect, and then use deviation from the 
regression line as the dependent variable indicating true change (Lord, 
1963). . This requires either a separate control group or a (Xj, X2) measure 
, for eaqh subject under a treatment condition and a control condition »- a 
procedure which is not always possible. The most reasonable solution seems 
to be to use anaiLysls of covarlance (ANCOVA) as it Is essentially an analysis 
of the X2 scores, adjusted on the basia of the regression line between X2 
" and Xi* . ' • , 

(11) Problem 2. Measurement Errors or the Unreliability-Invalidity . 
"Dilemma: The degree to which measurement errors exist In the Initial and/ 
or final measures, along with the degree to which tjhe Xi, X2 correlation 
fxceeds zero, Is reflected by a reduction In the reliability of the X1-X2 
difference score.. .. ' ,1 

Solutions. There exists. a wealth of Information on possible solutions 
to this problem (e/g. ; Lord, 1956, 1963; McNemar, 1958; Ng, 1974;. Tucker, • 
. m6; Wiley and Wiley, ' 1974) . ' 



The basic thesis of all these Articles Is thiat it is possible to compute 
a reliability coefficient * corrected for attenuation*, that is, the re- 
liability of a difference between /true scores*. (Errorless measures yielding 
reliabilities of 1.00 in both Xj^ and in X2). Once having obtained a 
reliable estimate of true difference it is thien possible to us'3 this 
attenuated reliability coefficient and multiply it by the observed X2-X1 
difference (but scaled as deviations from the means), thus obtaining a 
hypothetical true difference score or "regressed score" (McNemar, 1958). 
Although this is the bj^s'is of the solutions advocated by many psychometri- 
clans it has its deficiencies, the priotary one. being' that the number of 
alternate ways to compute this true gain score seems to be exceeded only 
by. the number' of papers written on the topic. The non-specialist left 
with a morass of equations and confusion. Another defici«sncy with the 
use of estimated t«ue difference sdores is that the regression coefficient 
used in the predictor . equation is based on a number of. assumptions, some 
of which may not always hold true. A recent report by Wiley and Wiley 
(1974) indicates that the assumption of independence of errors of measure* 
ment between tests. is frequently violated, thus giving overestimates of 
the attenuated reliability coefficient. This in turn would, result in 
overestimates of the true s^ln score. 

(iii) Problem 3. Equality of Scale Along the Range of^Scores (the 
Physicallsm-Subjectivism Dilemma): An observed score at the low^ange 
of the continuum may be measuring an attribute of behavior: quite different 
from that which is reflected by the aaine'test at the high iend of the range 
of scores. 

Solutions: There seem to be no adequate solutions per se for this 
problem. On<^ ,could use P-tcchnique methodology (a sort of factor analysis 
apft^opriate for chang'e data) to test th,e assumption that the two measulres 
are in fact measuring the ^jame thing (Bereiter,.*1963; Cattell, 1963). / 
Ubwever, this is not a solution, but rather a technique, to reveal the 
existence or non-existence of a problem. The answer, seems to be in 
• finding ways to avoid the problem rather than solve it - and this can 
be accomplished 'to n limited degree. If all groups are equated initially 
with respect to their scores on the, dependent variable, then any differ- 
ences between gri^ups in the amount of change within groups can be logically 
interpreted (Schmidt, 1972). • \y 



AAA 



This restriction allovm for tlVe conclusion that <jne group changed more, or 
'^Xeas, with regards to the particular dependent variiible being used. If one 
. group showed very large changes, and the other group very small ones, then 

it may; be difficult to interpret the meaning of the relative magnitudes of 

change scores, but it is still possible to state that otfe ^roup should /. 

significantly greater change than the other group on that particular trait* 

A General Solution to the Problems Assoc^'ated with Difference Scores: 

At this point the reader must be ^wo^dering, *'Is there no adequate solution to 
the problem of measuring change?" Hy answer is "Yes" there are adequate methods, . 
but not through* the use. of difference scores* ^ If one must use a change score, 
then perhaps the "best" estimator of a true difference, ^co re is Cronbach and. Furby*s 
"complete estimator" (^970): 

where D • is the *^true difference score** > ^axicl Is the t(rue score at tline It taking 
intd account numerous other categories of variables) which may be mult iyar late 
In nature and relate to the pre or post scores In some manner* The true score fot 
Xji^ is estimated as: 

* 0Xl^(X2>Xi) ' aXi^(W^Xi,X2) 
- pxx^Xi -f ^2(X2-Xi) ^^2*Xl) + 'a2(W.Xi,X2) ^^^^l^^Z) + constant 

where (X2 • Xi) and (W • Xx> Yx) are partial varlates. The purpose of presenting 
this, equation Is not to provide the reader with a useful statistical tool^ but 
rather to point out the extreitlie degree to which thie raw data can be tran9£ozmed If 
one wishes some sort of pure measure* The difficulty in interpreting this trans-^ 
formed score is obvious*- at least in terms of predictable observed behavior* 

' ' . : I \ • ' ' 

Two quotes provide a suitable summary of this investigator^ s position on the 
usjS of difference scoresj 

**Both the history of the problem and the logic of investigation 
Indicate that ^he last thing one wants to do is think in terms of 
or compute such change scores unless the problem makes it absolutely 
necessary*** (Wunhally, 1973| p* 87) 

"Gain scores are rarely useful, no matter how they may be ad- 
Justed or refined." (Cronbach and Furby, 1970, p* 68) . 



2« The Use of AIX Scores as the DeRenrifent Variables * ■ . ^- 

The analysis of all of the available data should provide an investigator with 
store information than does the limited, and suspect, informajtion prbvided in a 
dfifference score. These repeated measures analyses may be performed by either 
univariate or multivariate analysis of variance (MOVA, MAKQVA) on the raw scorea 
or on scqres adjusted for initial differences between groups « The ^more informa- 
tion available on the nature of change in behavior over time, the greater should 
be the degree of understanding of the nature and causes of that change. Conse- 
quently, in an experiment involving any length o€ time between the initiation of . 
the treatment and the final ot/servation^ it is desirable to take nui&erous measures 
per S. Although in some cases it is not possible to do tliis, 'either due td the 
contamination effect of the Measurement tool or to the nature^of the treatment 
procedure$9 in most motor behavior studies such repeated measures are quite 
feasible* 

a) Repeated Measures AtTOVA. 

/ . ■ . " N * 

The common ineitlhod for analyzing change foi* a repeated measures design is 

through a repeated measures or Ss x Treatments ANOVA. Given a typical, ex- 

periment lnvolvlx;ig two treatment groups (or *a treatment and control) with 20 
• ■ .1 
Ss nested within each group and repeated across say 10 trials (Fig. 1), one 

appropriate method for analyzing change could be to break down the cotal 

;| variability as given in Table I. 

[Insert Fig. 1 and Table I about here] 

effects of most interest here, with respect to the analysis of change, 
the Groups x Trials an4 its trend analysis components. Groups x Trials 
(Linear) and Groups x Trials (Quadratic) * The Groups x Trials inttiractlon 
Indicates the degree to which the. change over trials is the same for each 
group - which is probably the research question of most Interest; i.e. « is 
there a significant change in behavior over the time span of the experiment^ 
and, if so, does this, change show the same, or different, characteristics 
between the two experimental groups? 
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Fig. !• Schemata of 2 k 10 Factorial Experiment with 
/. . Repeated Measures on the Second Factor. 
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TABLE 1 



Analysis of Variance, with Trend, for a 2 x 10 Factorial Experiment 

wittt rRai)|ated Measures on the Second Factor J 



Source 



Groups 

Ss within Groups 
Trials 

Linear - 
Quadratic 
Residual 
Groups X Trials 

^GxTLin, 
GxTq^d. 

^ * ^Resid. 
SwG X Trials 



d£ 



38 



1 
1 

7 

1 
. 1 
7 



342 



Mean Square 



F Ratjl.0 



MSg 
MSs(G) 



HSt 



MStl 
MSTq 
MSt^ 



MS(jx 



MSGTl 
MSgTq 

MSs(G)T 



MSg/MSs(g) 



MST/MSs(G)t 

*%l./MSs(6)T 
MSxQ/MSsi(G)T 
MSu;j^/MSs(G)x 

MSgx/MSs(g)t 

*®GTi,^M?S<G)T 
**?<kQ^**^S<G)T 
MSGXi^/MSs(G)x 



Total 



399 



The Groups x Trials (Linear) asks essentially thet same questioi^wt with the * 
constraint that the change over time is. linear. In this edse a linear functlpn 
Is forced on the data and the test of signif ip^nce tests for equality, of slo^jp ; 
between^ the two groups, which iu behayitfirai^ amounts to a comparison of 
the rates 61 learning, rates of recovery^ etc. Similarly the >0??oups x ririals , 
(Quadratic) compares the two treatment gsbups oo^-the basis of the degree ^;^f \ 
curvature or time of plateauing of the scores over tipe.. *^ 



This anaxy^s then provide^ one p!(^slble/5|i/l>lutlon' for the Analysis \of. 
change suitable for many experimental co'nditions. By using % number of measures 



Instead of just two, th^ problems of regression eff 

are/ greatly reduced. The unrelilbilxty of the^data Is reflecjted by the magni- 

txide of the Trials interaction ^(or in this case the S(6) XcT) and is tlt^us 

. - .. ** . 
a ^ort of built in- protection a:gai^8t making erroneous research conclusi^ons 

based on unreliable data. The less reliable th^ data. Is, the larger the 'v;,. 

S X Trials, error term, the more difficult it^ ia to attain stajtistlcal sign!- 



ficance and the less likely it la to make a Type I error < 

^ The repeated measures. ANOVA is not the ideal solution to the problem^ of 
analyzing ctjangei hoj^feve^^ fbr a number of reasons. Firstly, the tests of 
signlficapce glj;^e limiited, information regarding the nature or form of the ^ 
change ove,r Time, asi thie^ trend analyses fit only polynomials to ;$he data, 
data which is frequently better fitted by a logarithmic or exponential ftinc- 
tion. Secondly^ it deals with mean values only and does not reveal reliable 
differences between subjects (within the same group), with respect in intra- 
individual behavioral changes over time (a stochastic model would ^det ect^ this) ^ 
Finally, and perhaps most importantly, the nature of the data common' to mos.t 
/studies in motor behaivior is such Jihat it violates the assumptions on which * 
the repeated measures ANOVA is founded. These asfsumptions are that the 
measures (1) are normally distrlbute|d, (11) exhibit equal variances under all 

* treatment conditions, and (111) have equal covar lances between all treatment 
pairs (the precise mathematical assumption is that all covarlance's equal zero 
but the P ratio Is virtually unaffected by vlolatlojj of this, assumption, pro- ' 
vldlng all covarlancea are equ^l)/ t^hlle the first two of these assumptions 

i are usually met with mator performance dajta, the third one rarely is* , 



This assumption can be . casually tested by examlntiig,the correlation matrix of 
• the repeated measures - the degree"^o, which all correlations are not equal 
Indicates the ''egree to which this aasumption is. violated. .It Is frequently 
the, case in our^ field of study to o\>taln data In which adjacent trial correla- ' 
tlons are very. high* but diminish as 'a function the number of ' intervening 
observations between any two measures. The resultant 6f this situation is an 
inflated.? value and a substantial increase in t,he probability of, ctsijamittlpg .a 
Type I error (as high as p » .15 when assuming a p « .05) . 

■ ■ ■ . ' . ■ ' ' * i . ^ 

The analysis of variance for repeated measures » which w^s first presentied 

here as a possible solution to some of the problems linhere'nt in the analysis • 

of change, has row beconie a problem itself. There are two pdsslbl^ ways by 

, ■ ... • . ■■• ■ -n . 

which MOVA may be validly used on repeated measures data which exhibits 

Unequal between . trial correlations : . ^ 

(1) Inflate the magnitude of the F needed for significance by reducing/ 
the associated degrees of, freedom (d.f.). Box (195A) has suggested 
that the d.f. for both the numerator and denominator, be multiplied 
by a factor 8, which is a function of the degree of heterogeneity of 
both the variances and the covarlances. . The greater the heterogeneity 
the smaller the calculated e and the larger the F value must be In 
order to reject the null hypotheses. 

(2) Greenhouse and Gelsser (1959) questioned the validity of the estimator 
e and its effect on the approximated distribution. They suggested 
the use of the minimum poss;Cble value of e, namely l/(k~l) where k 

is the number of levels of . the repeated factor, as. the factor Which 
should be applied to the d.f. in all situations* Although this id a 
statistically valid technique it is very conservative, thus resulting 
in a rather large probability of committing a Tjrpe II error. 

There are a number of excellent articles available which provide a lucid ex~ . 
planatlon of bbth the problem and the merits of these solutions (e.g., 
Davidson, 1972$ Galto, 1973; Gaito and 1»7lley., 1963; McCall and Appelbaum, 
1973; Mendo^ja, Toothaker and Nlcewand^r, 1974). 



^Procedures for statistical tests of this assumption are available in 
Winer (1971, p. 594). 



. . . • . . 12 

b) Repeated MeauareB MAMOVA. 

' ■ * . < * 

The other solution jto the problem of non-homogeneity of covariances ia to 
uae a technique which does not require this assumption •>- namely the multivariate 
analysis of variance. MANOVA requi^res no assumptions regarding the homogeneity 
of covariances and allows for an exact statistical test based on a known sig- . 
nificance level. "^Although th:^s technique has been avaiUble for rmy years » 
it has not been adopted by practicing researchcirs due to its extreme computa- 

tional complexity. However, the present accessibility of suitable computerized 

• ■ fl . , : '.<),.■ , ■ , ■ • ■ '■ ■ ■ ' ■ 

multivariate ^statistical packages at most universities jias eliminated such an 

excuse for* ignoring this very useful test and It should now be a standard' . 

atatistical tool for all researchers. Very briefly, what ^OVA does is to 

transform the^.k repeated measures fpr each subject into a set of (k-*l> scores 

through the application of independent contrasts (these are usually orthogonal 

polynomials, but they need n6t be as the resulting significance test is inde-- 

pendent of the choic'i of contrasts).' An analysis of variance type procedure 

Is then carried out on the vector of means of these derived scores with the 

mean square error being a variance-covariance matrix of within cell variabilities 

rather than a unitary scalar value as in the univariate procedure. The tests 

of significance provide an F ratio for the overall multivariate hyt>othe8i8 . 

that the trial means are equal, and for a two group experiment, that the change 

in" performance acrcss repeated measures is the same for each group. An overall 

significant ^ on these multivariate hypotheses allows the investigator to use 

appropriate follow-up tests while' maintaining an overall pre-determined level 

of significance. These -follcw«up proc^ures can take the form of simultaneous 

confidence intervals, step-down F ratios, or even the usual univariate F tests 

on each dependent variible separately or on the single d.£. contrasta aseocicited 

with trend analysis. / 

Another frequently used procedure associated with MAMOVA is discriminant 
analysis which tests whather two or more groups can be significantly separated 
•on the bases of their profilea (or, in the BM design, their pattern of change 
over time). It has been shown, however, that a Groups x Trials ANOVA is more 
versatile in detecting the nature of the differences between group profiles 
than is discriminant analysis (Thomas and Chissom, 1973). Although Thomas 
and Chissom failed to consider the restrictiye assumption inherent in the 
univariate G x T AttQVA, this is not a factor if the Trials effect, is broken 
down into polynomial coefficients (linear, quaviratic, etc*). 
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This essentially converts the univariate procedure to a multivariate technique 
and thus no longer requires the assumption of equal covar lances. Bock (1963), 
Cole and Grizzle (1966), and Finn (1969) 'have provided comprehensive discussions 
on the application of MANOVA to repeated measures data, and comparisons of the 
applications and outcomes of ANOVA versus M/UIOVA are well given by Davidson 
(i972). Hummel and Sllgo (1971), McCall and Appelbaum (1973) ,. and Poor (1973). 

c) Experimental and Statistical Adjustments for ANOVA and MAKOVA. 

"As was stated fibove, a number of the problems associated with the measura-"/ 
ment of change can be reduced If all treatment groups are Initially equal with 
respect to the dependent variables* The' four procedures available for 
achieving this initial equality are: random assignment, balancing or matching, 
blocking and analysis Of covarlance. 

■ ' . ■ * ' 

(I) Random Assignment . This, theoretically, is the best way as it equates 
groups initially with respect to all varlai>les. Unfortunately, the success. 

of random assignment is dependent upon the size of the samples and the popu- 
lation variabiiity of the independent variable of interests j^Samplies of sire 
100 almost guaranty equality (bu)^ it is never a certainty) i/whereas samples ; 
of size 5 are rather unlikely to result in equal distributions among the 
treatment groups* 

Some investigators advocate the random assignment of Ss to treatment groups 
folld^ed by a t test (sometimes with an exaggerated alpha, say .90) to determine! 
if the hypotheses of initial Equality can be accepted (Rosemler, 1968). If 
the hypotheses of initial equality is not tenable, then the investigator needs 
to either reassign S^s to treatments, Increase his sample size in the hope that 
the randomization process will eventually work, or adjust his groups with some . 
type of bailancing procedure-* None of these procedures are very satisfactory, 
from a statistical as well as a procedural aspect* 

(II) Balanc ing and/ or Matching « These procedures Involve assigning Ss 
to treatments on the basis of their initial scores (or some other related 
variable) in an attempt to equate groups Ittltlally* It has been shown that 
matching la always less efficient than analysis of covarlance and is usually 
less efficient than simple random sampling (&lllewltiz« 1965). ' 



It h^s.also been shown (Finney, 1957) that natchit^g is never as suitable as 
blocking, The obvious recoranendatlom is that these procedures should not be 

used anyroote in our empirical research studies. . 

• ? . . • ... . . ■ . • '■ 

(iii) BlocKlng , Blocking, when done on the basis of initial iscores, is 
essentially the same idea as balancing; however, the sampling and assignment 
procedures/are quite different, thus making blocking a statistically sound 
procedure, dorrect blocking technique requires knowledge of the distribution 
of th^' blocking variable in the population, an a priori -determination of the 
cut-off values which determlnie the blocking levels, and then sampling from 
," each of these population strata to form the blocks. 

■ ■ ■ . ^ ■■ ■ . . ■; ' ■■ ■'. 

: (Iv) Analysis of Covariance .^ Whereas blocklrig provides an experimental 
method of equating groups, analysis of eovarlance provides a statistical 
method for doing so. The choice between these two techniques, Is not a simple, 
one, as the relative advantages of one procedure; over the; other dejieind. upon 
the degree of relationship between the concomitant variable '(used for blocking 
or aa the. covarlaWe) and the dependent variable. Feldt (1958) has shown that 
if the correlation between the concoinltant variable and dependent variable is 
less than .60, blocking is better, whereas if it Is greater than .80, analysis 
of covarlance provides a more powerful statistical test. However, Peldt 
suggests that eiven with high correlations, blocking Is preferable as the 
relatively small advantage In precision shown by analysis of covarlance is ' 
more than lost due to the strict assimiptlons of regression Inherent in co« . 
variance; i.e., linearity of regression, and equality of regressio.n within 
treatment groups . 

... • ^ ■ ■ * 

A Suttinary of Section A 

For those generally conservative researchers who wish to restrict their 
statistical analyses to conventional parametric techniques, here are some .guide- 
lines: ■ . " 

1. Use MANOVA •? with trend analysis and covar .lance If necessary. 

a) Obtain a series of measures on each subject throughout the treatment 

period when change Is expected. 
«b) Use at least 20 more subjects. than there are measures per subject. 

c) Test for equality of groups Initially— If they are not equal, then 
use the Initial score as a covar late. 

d) Analyse the data using MAN0VA„^pX4)cedure6. 



. 2. , If difference scorea ate required for experimental or. thebretical reasons. 

; "then make the best of them by: / ' 

. - ' > ■ ■ ■ . ■ . . 

a) Attempt t^ inaxlmi^e the reliabilities of the pre-'test and poat^'teat 
^scores and minimize 'the* pre-post correlation (while, at the aame tlm^ 
\^ maintaining equality of raeaninging between the two sets of scores). 
: ' b) Equate the groups initially, as best as possible « .eieher through 

randomization with a large N, or through blocking, oh relevant variables. 
• c). Compute the reliabilities of the differehce scores so the data may be 

- interpreted with the required caution. . 
fi) Analyze the data with a t-test or MOVA. ; 

• " ... • '■' ■ . ' ■' ■ ■ ■ ■ : ... ' ■ •. . V ^" 

•B.- 'ALT?WAT% METHODS , ' ' - 

' Within the discipline of human kinetics^ ..the usual methods for analyzing change 

. - ■ . ■. > . ,. - ^ ^ . . ■ . . . ■ _ , , , 

.are the deterministic, parametric methods discussed, above.. Howevair, there are^a 

• ' ' . • . ' " ' $ ■■ ■ ■ ■ 

'number of alternate statistical techniques gradually being adopted by associated 

disciplines which may not be as precise in terms of hypothesis testing, but are 
probaSly more appropriate for describing and predicting perl^ormance . over time. 
Included in these procedures are three techniques ^hich have potential as useful 
statistical tools for the analysis of change bf motor performance da'^ta; namely, 
stochastic. methods, and the assofciated; time series analyses, factor analytic tech- 
niques for measuring change, ^nd curve-fitting. . ' . 

1. Stoc|^astic Processes 

A stochastic variable, which may be defined as a time-dependent variable, 
refers to any dependent measure which is observed re{featedly over time. Thus, 
in this content 1^ all change scores are stochastic to some axtent. A more 
general use of the term stochastic, however, is through its assoclatlp.n with 
stochastic processes \xi^ Markov chains - a series of time dependent events 
which are related to each other by^ ffaiksition probability. A transition 
, matrix, composed of a number of these transition probabilities, defines the 
probabilify that a dependent variable or measure will make a change (of some, 
specified magnitude) during the time between two successive observations. 

Stochastic procesjses are usually described in terms of a set of discrete 
states and a set of one-step transition probabilities* 



I*/ 




Thc^ states are classifications of the variables under observation, •such as the 
nuiinber of errors wade In a learning task, the attitude of an Individual toward 
physical activity at a certain point In time (e.g., favorable, indifferent, 
unfavorable) , or a heart rate at various stages of activity (altered from a 
ratio scale to an ordinal scale 6v nominal classification) . In general sto-. 
chastlc processes can be divided into four distinct classes: (a) discrete 
state t. discreite time, (b) discrete state - continuous time, (c) continuous 
state - discrete time, arid (d) continuous state •- conMnuous time. Type (a) 
is the process mpst commonly applied to models in the behavioral sciences as 
measurement, calculation, a^nd interpretation all become more difficult in the 
continuous cases. Queuelng processes, and the birth-death processes of ecology 
and genetics are examples of the second type. The third and fourth types of 
stochastic processes are less commonly used (see Bailey, 1964; or Karlin, 1966, 
for examples) . . ' - . 

The statistical analysis of change data through the. application of stochas- 
tic models .will yield both dejsoript.ive and inferential statistics which, can 
helj^ the researcher test his theor'ies,* Descriptive statistics of Interest are 
such values as: the transition probabill-ties themselves (and comparisons among 
transition probabilities under different experimental conditions) ; the asymp- 
totic value of a tranaitidn prdbability from time one, to some very distant 
tijnej the expected number of trials befoire learnitig, fatigue or some such 
absorption state occurs;; and the probability of being in some particular sta^e 
at a specified point in time. These statistics, which are fcalculated directly 
from the observed data, can then be compared with theoretical values calculated 
f rom tKe theorems of a model. Such com|)arisons prove very halpful in isolating 
faulty assumptions in the theory. For nxample, it could happen that the ob- 
served values for the total number of etrrors and the number of times the process 
was in a particular state both agreed cilosely with the theoretical values, but - 
the observed variance of the mean numbei: of errors deviated substantially from 
the theoretical value. This would suggcist that perhaps a more realistic model 
could be developed by using, say, 'a four state process rather than the two or 
three state one originally hypothesized. . 

The application of Inferential statistics requires knowing the distiributlon 
0^ the particular test statiitics before any probability statements can be .made. 



Such distributions have been" established by Andetson and Goodman (1957) for 
making statistical. Inferences about Markov chains (a, stochastic process in 
which the probability transition from one state to another is dependent only 
upon the state of the process at the previous time). Knowing these dlstrlbu- " 
tlons (they are all asymptotlcaily distributed as with various degrees of 
freedom) the followl'<ag hull hypotheses may "be \es ted. 

a) The transition probability Is Independent of t; that Is, test the 
stationarlty of the process to see if the transition from one state 
to another Is the same no matter what trial it occurs on. 

b) The stochastic process defined by treatment group one is the same 
Markov chain as the process defined by treatment group two. lie this 
hypothesis, is rejected and in fact thy group one data fits a flrst- 
brder chain and group two a second-ordei: chain, this tells the researcher 
that the trial^o trial scores for group two exhibit a greater degree 
of dependency on the past than do the scores for group one; 

c) In a process involving two sets of states, the transition probabilities 
in one set are independent of those in the other set of sttatet^ For 
example, the two state spaces may be levels of respirator jf^rdte ^and 
levels of heart rate during continuous exercise. ' This hypothesis tests 
whether the sequence of changes in respiratory rate is Ixidependent of 
the sequence of changes In heart rate. This is not at all the same as 
the usual hypotheses which tests whether a series of discrete respira- 
tory rates, are independent of a series of discrete' heart rates. 

« <. " . ' ' ' • ■ * 

t 

Once the statistics as predicted by the model have been compared with the ones 
calculated from the observed data the investigator* has a good indlcatipn of the 
adequacy of his model. More specifically j if the data do not agree with the 
model, he can tell exactly where the model and the data were incompatible »and' 
made the necessary adjustments to the appropriate theorems or assumptions of ' 
the model. Barring a very gross misrepresentation of the data by the model • 
it is not necessaty to discard the whole theory. In general, lack of agreement! 
between the model arid the observed data may be due to one or cjore of the 
following} Inapproprlateness of the model (the model requires a change In 
theorems or assumptions), errors in the design and execution of the experiment . 
(perhaps better experimental controls will eliminate' the effect of some ex- 
traneous variables), or a flaw in the theory upon which the model was baaed 
(the model-observation discrepancy should suggest the appropriate theoretical 
veyiftiohs) . 



stochastic methods have been used rather extensively in psychology, primarily 
in the area^of learning^ (see, for example, Greeno and Bjork, 1973, who list 
243 references dealing with mathematical learning theory, a large number of 
which are stochastic in nature) ^nd to a lesser extent in sociology (Carlson, 
^?I?A J"PP^^^ 1973). in the area of sport and physical, activity we 

are just beginning to examinfe the possibilities of stochastic, methods ^ but sp 
far have very little empirical support of its usefulness over the more conven- 
tlonal statistical techniques. Schutz (1970a) has described its potential on 
a theoretical basis, and provided an example of its practicality as' an analytical 
tool in evaluating^lscoring systems (1970b). However, he has also provided an 
Interesting example of how a behavioral theory^an be represented by a rather 
complex stochastic model which leads to nothing but confusion and mathematical 
merry-go-rounds (Schutz, 1971). Guppy and Fraser (1973), by using a Markov 
model to eifemine occupational mobility in professional sport, showed that^ base- 
ball players have idlffetrential mobility mten according to race. Other ongoing 
research (by Rennick at the University of ^ashington and Salmela at the Univer-f 

?^ I^aval) may Eiroyide us' with further: vexamples- of tlje advantages of 
stothastlc prpcessesi but : ifntil such timfe as a number 'of published research" 
articles appeAr which clearly show that stochastic methods provide greatet 
insight into the interpretation of empirical, data than dp standard statistical 
procedures, their general adoption cannot be recommended. 

* * ■ • . • * 

2. Time'-Series * * / 

A time-series . experiment involves repeated measures on one or more Indivl- * 
duals over a perlodof time, thus the resultant observations for each individual 
are time dependent and usually correlated (in effect, stochastic). Under these 
conditions repeated measures ANOVA procedures are not appropriate, ^and, unl^ess 
the sample slae Is large relative to thftynumber of observations per individual, 
neither is MANOVA. Methods for analyzing data from time-series experiments, 
which may bfe done on repeated measures from a single individual or on. the means 
pf a number of individuals, have developed rather vecently and consequently 
have not yet. been used extensively in empirical research. Statistical models 
for testing th© significance of the change in level of a nonstatlonary time- 
series and for cpmparlng time-series among different treatment groups have been 
proposed by a nutnber of statisticians in the past ten years (Box and Tiao, 1965 j 
JPnes,. Cvawellahd Kapuniai, 1970; Glass ahd Maguire, 1968; Gottman, McFall and. 
Barnett, 1969J Shumway, 1970; Strahan, 1971). 



Moet of these taethods Involve rather complex matrix manipulations and for 
this reason, alohg with the fact that they have. not been used to any extent 
in empirical research studies, they are not recommended to the iion-statisti- 
cJan in our field at this time* One time-series procedure which has been 
shown to be useful, howeyer, is the autocorrelation (serial correlation) which 
provides an indication of the degree of sequential dependencies among the' 
successive observations. A serial correlation of l^g one (rx) is obtained by 
pairing the first observation with the second, the second with the third, 
etc., and then calculating the product-moment correlation coefficient qn thkse 
n-1 pairs (tt being the number of repeated ^observations). Similarily, serial 
correlations of lag 2\ 3, etc.; (r2, r3) can be calculated. Each coefficient 
by itself gives some information on the serial dependencies in the data^ and 
if one wiBr.e to plot r^ agaiiist t (the time-lag) for successively increasing 
values of t,.the resultant graph,. or correlogram, would indicate the change 
in serial dependencies throughput the total series. 

' ' ■ . • . . • ..... 

COrrelograms are particularly uisef ul for experimental situations in which 
a series of repeated measures are obtained before and after a treatment is 
administered. A change in 'the nature or degree of the* serial dependencies 
following administration of treatment indicates a significant treatment ' 
effect (the .test for statistical significance of a serial correlation coef- 
ficient is the same as that for an ordinary product-moment correlation 
coefficient). Other appropriate situations for utilizing a time-series. are 
the 'two-group case in which the correlograms of the two groups can be compared, 
and experiments involving measures on a number of ^dependent variables at each 
point in time. This latter case lends itself to multiple time-series analysis, 
involving cross correlations (serial) betweeil the variables, and thus tests 
the extent to which trial-to-trial variation in one variable can be attributed 
to concomitant trial-to-trlal' variation in andther. variable (Holtaihan, 1963). 



3. Factor Analytic Models of Change 

The use of various factor analytic methodologies as a powerful tool for 
analysing change .has been advocated by a number of researchers, especially 
those in developmental psychology (e.g.; Baltes and Nesselroade, 1973; Bentler, 
1973) and educational psychology (Corballis* 1970; Harris, 1963) These pro- 
cedures, requiring multiple measures at each point in time, ,may involve the 
comparison of fl^tor loadings and factor scores between time periods (Corballisi' 
1970), or may require an extension of the usual two-way data matrix (subjects ' 
by variables) to a three-way data matrix (the third variable being occasions) ' 
and its resultant, and rather complex, factor structure (Tucker, .1963). In 
motor behavior research we usually restrict our depeyiderit variables to a few 
(five or less), and thus factor analytic procedures are not appropriate. ' For 
this reason (along with the fact that tihis investigator has had no previous : 
experience with factor analytic change modiels) , and also because it is not 
appropriate for the data used tor the empirical examples given in this paper, 
.no further discussion of factor analysis and its associated image and canonical 
' ainalysies are presented here. Readers interested in this method are fencouraged 
to read the references previously mentioned and attempt to apply these methods 
to motor behavior data. Unfortunately we are all somewhat reluctant to attempt 
a new technique until we are provided with empirical evidence that it will tell 
us something that the conventional, established procedures do not. Factor . 
analytic change models may be useful tools - we need someone to prove this to 
us. '.■. *. \ . ' ' ■ . ■ ■ » 

4. Curve-Pitting as a Change Indicalbr * 

The fitting of a mathematical function to a set of points spaced along a 
tine continuum can be done in a number of. ways, the most common of which is the 
previously mentioned trend analysis. TreAd analysis will fit a jset of ortho- 
gonal polynomial coefficients to a series of trial means, yielding an F ratio 
for each degree polynomial. This provides as estimate of the degree of linearity, 
quadratic curvature, etc., displayed by a series of points, or, if there is more 
than one treatment group, the groups x trials (linear), etc., effects indicate 
the difference between groups in the nature of the change in performance over 
the total series of trials. While this procedure is adequate if the data are 
indeed of a polynomial nature, two problems arise if it Is not. Firstly, it 
la obvious that if Jjhe data can be better represented by an exponential or lo- 
garithmic function, then the best £ltt|.n^ polynomial is less than adequate. 



The second problem associated with using V:rend aralyais is that the cutV^ is' 
fltted to the trial means rather than to the individual scores of each trial 
for each subject. If the data is in fact polynomial in nature, then the func- 
tion fitting the trial means will be identical to that obtained by finding the 
function for each subject and then averaging the coefficients of the individual 
equations, llowever, if the data is exponential, then the curve of the means 
may grossly distort the typical inuividual curve in. that it will sirooth out ^ 
reliable and consistent discontinuities in the data. This has been clearly 
shown. l>y Merrill (1931) with growth data, and by Sidman (1952) with learning 
scores. . ■ ■' • ■ 

Examples ;of fitting non-polynomijal curves to motor behavior data are not 
uncommon; however, in most: cases the! investigators restrict their analyses to 
descriptiT^e techniques only, that is, they find the best fi^feng function, and 
attempt to interpret it in a subjective manner. Furthermore, it seems that at 
tlrnfes researchers attempt /to find the best fitting mathemcitical function 
without regard to the theoretical meaning associated with the parameters of 
the derived function." While it^ is true that such a function may be useful * 
for predictive purposes. It is of little value in the description and explana- 
tion of behavior. ' 

* ■ ■ ■ . ' 

' ■ J • ♦ . •* 

* J • ■< 

Th.e procedures recommended here for antlyzing change through the application 

of curvei-fltting are as follows; § ' ' 

■ ■ ■ . . ■■ • ' . . ' ■ ■ . ■ ' 

. a) Select, a priori, the type of function which best represents the under-' 

* lying physiological or psychological process hypothesized. The function 

should be simple enough so that the parameters are interpretable and 

can differentiate between treatment groups. For example, the exponen- . 

■ ■ ' ■ ■ ■ ■ . ^ 

tial function . . -.r^ ' 

V y « a + be^*^ . , 

represents a negatively decelerating function suitable for a nuaber of 
motor perfbrmance data sets. The parameter a reflects the asymptotic 
value of y (its minimum in this case, which will be reached eventually), 
the parameter b indicates the total change: in y from time zero to • 
asymptote, and c describes the rate of change in y with respect to 
time t, , 



b) After collecting the data, fit this function to the series ^f data 
poirits for each subject. Thus each subject now has three dependent 
variable scores, a value for each of a, b, and c. . . • 

c) Determine the percent of variance accotinted f or by the function and 
either accept it or reject on the basis <Jf an a "priori cut-rpff level. 

d) Assuming that there are two Or mort treatmenl^groupd, ANOVA' can now 
be performed on each of the tl^ree d.jpendent variablda, providing 

' M:ests of hypotheses on differences among the groups with respect to: . 

asymptotic performance, total amount of change, and rate of change, 

• •• ■ " . ' ■ ■ ■ • " ' • . . ' 

A general theoretical explianation of these procedures, along with sugges- 
tions for more sophisticated techniques, is provided by Snee (1972), and Henry 
and DeMbor*s (1950) article gives an excellent example of this type of method- 
ology. 



APPENDIX 

' - ^ • ■ i-v; • • ■ ■ .* - ' . ' 

The following tablee and figure arS prodded for empirlcail comparisons among 
the nuotercus statistical methods suggested In thli» paper* , Three §ets of data were 
conputer generated, each one simulating a 2 x 20 factorial design with repe^|;^4 
laeasutes on the second factor/ Factor one represents two treatpent groups 
(n « 30/ group), and Factor two can be considered a days or trials factor* The 
three experiments represent different conditions of the variance-coyariance matrix 
but all had the same means and variances (a constant variance of .10.0 for all 
trials, and means ranging from 7.0 to 31.0). Fig. 2 shows the trial means for 
each treatment "group, and Table 1 gives the exaci^t values for each case. The three 

• ' ' ' . ■ , ■ . ^ • 

cases representing different covariance structurjes are; 

,-, • ' ■ ' ■ ■ ■ ' ' . ■ 

Case li A constant covariance of 2.0 betii^een all pairs 



of trials, thus yielding an r^j 
trials i,j (i,j « 1, 20;i 3* 



« .2 for all 
J) 



Case 2: A constant' covariance of 8.0 betiiieen all pairs 
of trials (r « .8). 1 . 

Case 3: A viarying covariance ^ ranging from 9.0 for ' 
. adjacent trialis to 1.0 for trials 15 or more 
. steps apart (r » .9 to ll). 

* ' * . • 

Tables 2 and 3 give the F ratios and error variances for each of the stati- 
stical tests commonly used to analyze change. / The t tests at the top of Table 2 
are included as* this procedure is used occasionally, even though it is completely 
invalid. In this 2-t-test procjedure a t value is (calculated on the difference 
(Post test -.Pre test) for group I and another t for group II. A subjective 
assessment is then made on the relative magnitude of the two t*s« The other > 
tests are standard statistical procedures using various forms of the dependent 
variable (difference scores* trial 20 minus trial 1; difference scores, m^dn of 
trials 18-20 loii^us mean of trials 1-3$ final score; all scores). 

Table 4 shows the autocorrelations for lags of one to ten for each group, 
within each case. • 
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Fig. 2. Graph of Peirfomance Change 
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